Overview

Dataset statistics

Number of variables21
Number of observations10223
Missing cells90
Missing cells (%)< 0.1%
Duplicate rows75
Duplicate rows (%)0.7%
Total size in memory1.6 MiB
Average record size in memory168.0 B

Variable types

Numeric15
Categorical6

Warnings

Dataset has 75 (0.7%) duplicate rowsDuplicates
Credit_Limit is highly correlated with Avg_Open_To_BuyHigh correlation
Avg_Open_To_Buy is highly correlated with Credit_LimitHigh correlation
Dependent_count has 910 (8.9%) zeros Zeros
Contacts_Count_12_mon has 418 (4.1%) zeros Zeros
Total_Revolving_Bal has 2483 (24.3%) zeros Zeros
Avg_Utilization_Ratio has 2483 (24.3%) zeros Zeros

Reproduction

Analysis started2021-07-13 17:20:59.194513
Analysis finished2021-07-13 17:22:15.225116
Duration1 minute and 16.03 seconds
Software versionpandas-profiling v2.12.0
Download configurationconfig.yaml

Variables

CLIENTNUM
Real number (ℝ≥0)

Distinct10127
Distinct (%)99.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean739224456.9
Minimum708082083
Maximum828343083
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum708082083
5-th percentile709110820.5
Q1713036770.5
median717942033
Q3773181595.5
95-th percentile814216683
Maximum828343083
Range120261000
Interquartile range (IQR)60144825

Descriptive statistics

Standard deviation36928379.3
Coefficient of variation (CV)0.04995557026
Kurtosis-0.621690104
Mean739224456.9
Median Absolute Deviation (MAD)6379200
Skewness0.9927338024
Sum7.557091623 × 1012
Variance1.363705198 × 1015
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
7835549582
 
< 0.1%
7783482332
 
< 0.1%
7186198832
 
< 0.1%
7854327332
 
< 0.1%
7197126332
 
< 0.1%
7152595832
 
< 0.1%
8061601082
 
< 0.1%
7188138332
 
< 0.1%
7891726832
 
< 0.1%
7196615582
 
< 0.1%
Other values (10117)10203
99.8%
ValueCountFrequency (%)
7080820831
< 0.1%
7080832831
< 0.1%
7080845581
< 0.1%
7080854581
< 0.1%
7080869581
< 0.1%
ValueCountFrequency (%)
8283430831
< 0.1%
8282989081
< 0.1%
8282949331
< 0.1%
8282918581
< 0.1%
8282883331
< 0.1%

Attrition_Flag
Categorical

Distinct2
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Existing Customer
8590 
Attrited Customer
1633 

Length

Max length17
Median length17
Mean length17
Min length17

Characters and Unicode

Total characters173791
Distinct characters16
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowExisting Customer
2nd rowExisting Customer
3rd rowExisting Customer
4th rowExisting Customer
5th rowExisting Customer
ValueCountFrequency (%)
Existing Customer8590
84.0%
Attrited Customer1633
 
16.0%
Histogram of lengths of the category
ValueCountFrequency (%)
customer10223
50.0%
existing8590
42.0%
attrited1633
 
8.0%

Most occurring characters

ValueCountFrequency (%)
t23712
13.6%
i18813
10.8%
s18813
10.8%
e11856
 
6.8%
r11856
 
6.8%
10223
 
5.9%
C10223
 
5.9%
u10223
 
5.9%
o10223
 
5.9%
m10223
 
5.9%
Other values (6)37626
21.7%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter143122
82.4%
Uppercase Letter20446
 
11.8%
Space Separator10223
 
5.9%

Most frequent character per category

ValueCountFrequency (%)
t23712
16.6%
i18813
13.1%
s18813
13.1%
e11856
8.3%
r11856
8.3%
u10223
7.1%
o10223
7.1%
m10223
7.1%
x8590
 
6.0%
n8590
 
6.0%
Other values (2)10223
7.1%
ValueCountFrequency (%)
C10223
50.0%
E8590
42.0%
A1633
 
8.0%
ValueCountFrequency (%)
10223
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin163568
94.1%
Common10223
 
5.9%

Most frequent character per script

ValueCountFrequency (%)
t23712
14.5%
i18813
11.5%
s18813
11.5%
e11856
7.2%
r11856
7.2%
C10223
 
6.2%
u10223
 
6.2%
o10223
 
6.2%
m10223
 
6.2%
E8590
 
5.3%
Other values (5)29036
17.8%
ValueCountFrequency (%)
10223
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII173791
100.0%

Most frequent character per block

ValueCountFrequency (%)
t23712
13.6%
i18813
10.8%
s18813
10.8%
e11856
 
6.8%
r11856
 
6.8%
10223
 
5.9%
C10223
 
5.9%
u10223
 
5.9%
o10223
 
5.9%
m10223
 
5.9%
Other values (6)37626
21.7%

Customer_Age
Real number (ℝ)

Distinct48
Distinct (%)0.5%
Missing30
Missing (%)0.3%
Infinite0
Infinite (%)0.0%
Mean46.30569999
Minimum-59
Maximum73
Zeros0
Zeros (%)0.0%
Negative6
Negative (%)0.1%
Memory size80.0 KiB

Quantile statistics

Minimum-59
5-th percentile33
Q141
median46
Q352
95-th percentile60
Maximum73
Range132
Interquartile range (IQR)11

Descriptive statistics

Standard deviation8.35413348
Coefficient of variation (CV)0.1804126378
Kurtosis10.06792048
Mean46.30569999
Median Absolute Deviation (MAD)6
Skewness-0.9321881721
Sum471994
Variance69.7915462
MonotonicityNot monotonic
Histogram with fixed size bins (bins=48)
ValueCountFrequency (%)
44503
 
4.9%
49501
 
4.9%
45489
 
4.8%
46489
 
4.8%
47481
 
4.7%
43474
 
4.6%
48473
 
4.6%
50454
 
4.4%
42428
 
4.2%
51400
 
3.9%
Other values (38)5501
53.8%
ValueCountFrequency (%)
-592
 
< 0.1%
-502
 
< 0.1%
-412
 
< 0.1%
2678
0.8%
2732
0.3%
ValueCountFrequency (%)
731
 
< 0.1%
701
 
< 0.1%
682
< 0.1%
674
< 0.1%
664
< 0.1%

Gender
Categorical

Distinct8
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
F
5372 
M
4811 
MALE
 
8
Male
 
8
Femle
 
8
Other values (3)
 
16

Length

Max length6
Median length1
Mean length1.014477159
Min length1

Characters and Unicode

Total characters10371
Distinct characters10
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowM
2nd rowF
3rd rowM
4th rowfemale
5th rowMALE
ValueCountFrequency (%)
F5372
52.5%
M4811
47.1%
MALE8
 
0.1%
Male8
 
0.1%
Femle8
 
0.1%
female6
 
0.1%
male6
 
0.1%
FEmale4
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
f5372
52.5%
m4811
47.1%
male22
 
0.2%
female10
 
0.1%
femle8
 
0.1%

Most occurring characters

ValueCountFrequency (%)
F5384
51.9%
M4827
46.5%
e46
 
0.4%
l32
 
0.3%
m24
 
0.2%
a24
 
0.2%
E12
 
0.1%
A8
 
0.1%
L8
 
0.1%
f6
 
0.1%

Most occurring categories

ValueCountFrequency (%)
Uppercase Letter10239
98.7%
Lowercase Letter132
 
1.3%

Most frequent character per category

ValueCountFrequency (%)
F5384
52.6%
M4827
47.1%
E12
 
0.1%
A8
 
0.1%
L8
 
0.1%
ValueCountFrequency (%)
e46
34.8%
l32
24.2%
m24
18.2%
a24
18.2%
f6
 
4.5%

Most occurring scripts

ValueCountFrequency (%)
Latin10371
100.0%

Most frequent character per script

ValueCountFrequency (%)
F5384
51.9%
M4827
46.5%
e46
 
0.4%
l32
 
0.3%
m24
 
0.2%
a24
 
0.2%
E12
 
0.1%
A8
 
0.1%
L8
 
0.1%
f6
 
0.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII10371
100.0%

Most frequent character per block

ValueCountFrequency (%)
F5384
51.9%
M4827
46.5%
e46
 
0.4%
l32
 
0.3%
m24
 
0.2%
a24
 
0.2%
E12
 
0.1%
A8
 
0.1%
L8
 
0.1%
f6
 
0.1%

Dependent_count
Real number (ℝ≥0)

ZEROS

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.346180182
Minimum0
Maximum5
Zeros910
Zeros (%)8.9%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q11
median2
Q33
95-th percentile4
Maximum5
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.29821862
Coefficient of variation (CV)0.5533328726
Kurtosis-0.6846162053
Mean2.346180182
Median Absolute Deviation (MAD)1
Skewness-0.02168742599
Sum23985
Variance1.685371584
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32760
27.0%
22676
26.2%
11860
18.2%
41592
15.6%
0910
 
8.9%
5425
 
4.2%
ValueCountFrequency (%)
0910
 
8.9%
11860
18.2%
22676
26.2%
32760
27.0%
41592
15.6%
ValueCountFrequency (%)
5425
 
4.2%
41592
15.6%
32760
27.0%
22676
26.2%
11860
18.2%

Education_Level
Categorical

Distinct9
Distinct (%)0.1%
Missing3
Missing (%)< 0.1%
Memory size80.0 KiB
GrAduate
3160 
High School
2032 
Unknown
1531 
Uneducated
1496 
college
1021 
Other values (4)
980 

Length

Max length13
Median length8
Mean length8.938062622
Min length6

Characters and Unicode

Total characters91347
Distinct characters27
Distinct categories4 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique2 ?
Unique (%)< 0.1%

Sample

1st rowHigh School
2nd rowGrAduate
3rd rowGrAduate
4th rowHigh School
5th rowUneducated
ValueCountFrequency (%)
GrAduate3160
30.9%
High School2032
19.9%
Unknown1531
15.0%
Uneducated1496
14.6%
college1021
 
10.0%
Post-Graduate519
 
5.1%
Doctorate459
 
4.5%
ghjefs1
 
< 0.1%
shdjafs1
 
< 0.1%
(Missing)3
 
< 0.1%
Histogram of lengths of the category
ValueCountFrequency (%)
graduate3160
25.8%
school2032
16.6%
high2032
16.6%
unknown1531
12.5%
uneducated1496
12.2%
college1021
 
8.3%
post-graduate519
 
4.2%
doctorate459
 
3.7%
ghjefs1
 
< 0.1%
shdjafs1
 
< 0.1%

Most occurring characters

ValueCountFrequency (%)
e9173
 
10.0%
o8053
 
8.8%
d6672
 
7.3%
t6612
 
7.2%
a6154
 
6.7%
n6089
 
6.7%
u5175
 
5.7%
c5008
 
5.5%
r4138
 
4.5%
l4074
 
4.5%
Other values (17)30199
33.1%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter73888
80.9%
Uppercase Letter14908
 
16.3%
Space Separator2032
 
2.2%
Dash Punctuation519
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
e9173
12.4%
o8053
10.9%
d6672
9.0%
t6612
8.9%
a6154
8.3%
n6089
8.2%
u5175
7.0%
c5008
 
6.8%
r4138
 
5.6%
l4074
 
5.5%
Other values (8)12740
17.2%
ValueCountFrequency (%)
G3679
24.7%
A3160
21.2%
U3027
20.3%
H2032
13.6%
S2032
13.6%
P519
 
3.5%
D459
 
3.1%
ValueCountFrequency (%)
2032
100.0%
ValueCountFrequency (%)
-519
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin88796
97.2%
Common2551
 
2.8%

Most frequent character per script

ValueCountFrequency (%)
e9173
 
10.3%
o8053
 
9.1%
d6672
 
7.5%
t6612
 
7.4%
a6154
 
6.9%
n6089
 
6.9%
u5175
 
5.8%
c5008
 
5.6%
r4138
 
4.7%
l4074
 
4.6%
Other values (15)27648
31.1%
ValueCountFrequency (%)
2032
79.7%
-519
 
20.3%

Most occurring blocks

ValueCountFrequency (%)
ASCII91347
100.0%

Most frequent character per block

ValueCountFrequency (%)
e9173
 
10.0%
o8053
 
8.8%
d6672
 
7.3%
t6612
 
7.2%
a6154
 
6.7%
n6089
 
6.7%
u5175
 
5.7%
c5008
 
5.5%
r4138
 
4.5%
l4074
 
4.5%
Other values (17)30199
33.1%

Marital_Status
Categorical

Distinct4
Distinct (%)< 0.1%
Missing0
Missing (%)0.0%
Memory size80.0 KiB
Married
4740 
Single
3971 
Unknown
760 
Divorced
752 

Length

Max length8
Median length7
Mean length6.685121784
Min length6

Characters and Unicode

Total characters68342
Distinct characters17
Distinct categories2 ?
Distinct scripts1 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowMarried
2nd rowSingle
3rd rowMarried
4th rowUnknown
5th rowMarried
ValueCountFrequency (%)
Married4740
46.4%
Single3971
38.8%
Unknown760
 
7.4%
Divorced752
 
7.4%
Histogram of lengths of the category
ValueCountFrequency (%)
married4740
46.4%
single3971
38.8%
unknown760
 
7.4%
divorced752
 
7.4%

Most occurring characters

ValueCountFrequency (%)
r10232
15.0%
i9463
13.8%
e9463
13.8%
n6251
9.1%
d5492
8.0%
M4740
6.9%
a4740
6.9%
S3971
 
5.8%
g3971
 
5.8%
l3971
 
5.8%
Other values (7)6048
8.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter58119
85.0%
Uppercase Letter10223
 
15.0%

Most frequent character per category

ValueCountFrequency (%)
r10232
17.6%
i9463
16.3%
e9463
16.3%
n6251
10.8%
d5492
9.4%
a4740
8.2%
g3971
 
6.8%
l3971
 
6.8%
o1512
 
2.6%
k760
 
1.3%
Other values (3)2264
 
3.9%
ValueCountFrequency (%)
M4740
46.4%
S3971
38.8%
U760
 
7.4%
D752
 
7.4%

Most occurring scripts

ValueCountFrequency (%)
Latin68342
100.0%

Most frequent character per script

ValueCountFrequency (%)
r10232
15.0%
i9463
13.8%
e9463
13.8%
n6251
9.1%
d5492
8.0%
M4740
6.9%
a4740
6.9%
S3971
 
5.8%
g3971
 
5.8%
l3971
 
5.8%
Other values (7)6048
8.8%

Most occurring blocks

ValueCountFrequency (%)
ASCII68342
100.0%

Most frequent character per block

ValueCountFrequency (%)
r10232
15.0%
i9463
13.8%
e9463
13.8%
n6251
9.1%
d5492
8.0%
M4740
6.9%
a4740
6.9%
S3971
 
5.8%
g3971
 
5.8%
l3971
 
5.8%
Other values (7)6048
8.8%

Income_Category
Categorical

Distinct6
Distinct (%)0.1%
Missing18
Missing (%)0.2%
Memory size80.0 KiB
Less than $40K
3585 
$40K - $60K
1806 
$80K - $120K
1556 
$60K - $80K
1420 
Unknown
1100 

Length

Max length14
Median length12
Mean length11.48593827
Min length7

Characters and Unicode

Total characters117214
Distinct characters22
Distinct categories7 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st row$60K - $80K
2nd rowLess than $40K
3rd row$80K - $120K
4th rowLess than $40K
5th row$60K - $80K
ValueCountFrequency (%)
Less than $40K3585
35.1%
$40K - $60K1806
17.7%
$80K - $120K1556
15.2%
$60K - $80K1420
 
13.9%
Unknown1100
 
10.8%
$120K +738
 
7.2%
(Missing)18
 
0.2%
Histogram of lengths of the category
ValueCountFrequency (%)
5520
19.9%
40k5391
19.5%
less3585
13.0%
than3585
13.0%
60k3226
11.7%
80k2976
10.8%
120k2294
8.3%
unknown1100
 
4.0%

Most occurring characters

ValueCountFrequency (%)
17472
14.9%
$13887
11.8%
013887
11.8%
K13887
11.8%
s7170
 
6.1%
n6885
 
5.9%
45391
 
4.6%
-4782
 
4.1%
L3585
 
3.1%
e3585
 
3.1%
Other values (12)26683
22.8%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter31695
27.0%
Decimal Number30068
25.7%
Uppercase Letter18572
15.8%
Space Separator17472
14.9%
Currency Symbol13887
11.8%
Dash Punctuation4782
 
4.1%
Math Symbol738
 
0.6%

Most frequent character per category

ValueCountFrequency (%)
s7170
22.6%
n6885
21.7%
e3585
11.3%
t3585
11.3%
h3585
11.3%
a3585
11.3%
k1100
 
3.5%
o1100
 
3.5%
w1100
 
3.5%
ValueCountFrequency (%)
013887
46.2%
45391
 
17.9%
63226
 
10.7%
82976
 
9.9%
12294
 
7.6%
22294
 
7.6%
ValueCountFrequency (%)
K13887
74.8%
L3585
 
19.3%
U1100
 
5.9%
ValueCountFrequency (%)
$13887
100.0%
ValueCountFrequency (%)
17472
100.0%
ValueCountFrequency (%)
-4782
100.0%
ValueCountFrequency (%)
+738
100.0%

Most occurring scripts

ValueCountFrequency (%)
Common66947
57.1%
Latin50267
42.9%

Most frequent character per script

ValueCountFrequency (%)
K13887
27.6%
s7170
14.3%
n6885
13.7%
L3585
 
7.1%
e3585
 
7.1%
t3585
 
7.1%
h3585
 
7.1%
a3585
 
7.1%
U1100
 
2.2%
k1100
 
2.2%
Other values (2)2200
 
4.4%
ValueCountFrequency (%)
17472
26.1%
$13887
20.7%
013887
20.7%
45391
 
8.1%
-4782
 
7.1%
63226
 
4.8%
82976
 
4.4%
12294
 
3.4%
22294
 
3.4%
+738
 
1.1%

Most occurring blocks

ValueCountFrequency (%)
ASCII117214
100.0%

Most frequent character per block

ValueCountFrequency (%)
17472
14.9%
$13887
11.8%
013887
11.8%
K13887
11.8%
s7170
 
6.1%
n6885
 
5.9%
45391
 
4.6%
-4782
 
4.1%
L3585
 
3.1%
e3585
 
3.1%
Other values (12)26683
22.8%

Card_Category
Categorical

Distinct4
Distinct (%)< 0.1%
Missing39
Missing (%)0.4%
Memory size80.0 KiB
Blue
9486 
Silver
 
560
gold
 
118
platinum
 
20

Length

Max length9
Median length4
Mean length4.119795758
Min length4

Characters and Unicode

Total characters41956
Distinct characters17
Distinct categories3 ?
Distinct scripts2 ?
Distinct blocks1 ?
The Unicode Standard assigns character properties to each code point, which can be used to analyse textual variables.

Unique

Unique0 ?
Unique (%)0.0%

Sample

1st rowBlue
2nd rowBlue
3rd rowBlue
4th rowgold
5th rowSilver
ValueCountFrequency (%)
Blue9486
92.8%
Silver560
 
5.5%
gold118
 
1.2%
platinum20
 
0.2%
(Missing)39
 
0.4%
Histogram of lengths of the category
ValueCountFrequency (%)
blue9486
93.1%
silver560
 
5.5%
gold118
 
1.2%
platinum20
 
0.2%

Most occurring characters

ValueCountFrequency (%)
l10184
24.3%
e10046
23.9%
u9506
22.7%
B9486
22.6%
i580
 
1.4%
S560
 
1.3%
v560
 
1.3%
r560
 
1.3%
g118
 
0.3%
o118
 
0.3%
Other values (7)238
 
0.6%

Most occurring categories

ValueCountFrequency (%)
Lowercase Letter31890
76.0%
Uppercase Letter10046
 
23.9%
Space Separator20
 
< 0.1%

Most frequent character per category

ValueCountFrequency (%)
l10184
31.9%
e10046
31.5%
u9506
29.8%
i580
 
1.8%
v560
 
1.8%
r560
 
1.8%
g118
 
0.4%
o118
 
0.4%
d118
 
0.4%
p20
 
0.1%
Other values (4)80
 
0.3%
ValueCountFrequency (%)
B9486
94.4%
S560
 
5.6%
ValueCountFrequency (%)
20
100.0%

Most occurring scripts

ValueCountFrequency (%)
Latin41936
> 99.9%
Common20
 
< 0.1%

Most frequent character per script

ValueCountFrequency (%)
l10184
24.3%
e10046
24.0%
u9506
22.7%
B9486
22.6%
i580
 
1.4%
S560
 
1.3%
v560
 
1.3%
r560
 
1.3%
g118
 
0.3%
o118
 
0.3%
Other values (6)218
 
0.5%
ValueCountFrequency (%)
20
100.0%

Most occurring blocks

ValueCountFrequency (%)
ASCII41956
100.0%

Most frequent character per block

ValueCountFrequency (%)
l10184
24.3%
e10046
23.9%
u9506
22.7%
B9486
22.6%
i580
 
1.4%
S560
 
1.3%
v560
 
1.3%
r560
 
1.3%
g118
 
0.3%
o118
 
0.3%
Other values (7)238
 
0.6%

Months_on_book
Real number (ℝ≥0)

Distinct44
Distinct (%)0.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean35.96126382
Minimum13
Maximum56
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum13
5-th percentile22
Q132
median36
Q340
95-th percentile50
Maximum56
Range43
Interquartile range (IQR)8

Descriptive statistics

Standard deviation7.986247353
Coefficient of variation (CV)0.2220791626
Kurtosis0.4041994131
Mean35.96126382
Median Absolute Deviation (MAD)4
Skewness-0.1036859509
Sum367632
Variance63.78014679
MonotonicityNot monotonic
Histogram with fixed size bins (bins=44)
ValueCountFrequency (%)
362491
24.4%
37363
 
3.6%
34358
 
3.5%
38349
 
3.4%
39344
 
3.4%
40336
 
3.3%
35320
 
3.1%
31319
 
3.1%
33308
 
3.0%
30302
 
3.0%
Other values (34)4733
46.3%
ValueCountFrequency (%)
1370
0.7%
1416
 
0.2%
1534
0.3%
1629
0.3%
1739
0.4%
ValueCountFrequency (%)
56107
1.0%
5542
 
0.4%
5455
0.5%
5378
0.8%
5265
0.6%

Total_Relationship_Count
Real number (ℝ≥0)

Distinct6
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean3.815024944
Minimum1
Maximum6
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum1
5-th percentile1
Q13
median4
Q35
95-th percentile6
Maximum6
Range5
Interquartile range (IQR)2

Descriptive statistics

Standard deviation1.553242522
Coefficient of variation (CV)0.4071382351
Kurtosis-1.005754799
Mean3.815024944
Median Absolute Deviation (MAD)1
Skewness-0.1627050103
Sum39001
Variance2.412562333
MonotonicityNot monotonic
Histogram with fixed size bins (bins=6)
ValueCountFrequency (%)
32328
22.8%
41929
18.9%
51912
18.7%
61886
18.4%
21257
12.3%
1911
 
8.9%
ValueCountFrequency (%)
1911
 
8.9%
21257
12.3%
32328
22.8%
41929
18.9%
51912
18.7%
ValueCountFrequency (%)
61886
18.4%
51912
18.7%
41929
18.9%
32328
22.8%
21257
12.3%

Months_Inactive_12_mon
Real number (ℝ≥0)

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.340408882
Minimum0
Maximum6
Zeros31
Zeros (%)0.3%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.011834545
Coefficient of variation (CV)0.4323323812
Kurtosis1.099324143
Mean2.340408882
Median Absolute Deviation (MAD)1
Skewness0.6344688189
Sum23926
Variance1.023809146
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
33871
37.9%
23317
32.4%
12256
22.1%
4443
 
4.3%
5179
 
1.8%
6126
 
1.2%
031
 
0.3%
ValueCountFrequency (%)
031
 
0.3%
12256
22.1%
23317
32.4%
33871
37.9%
4443
 
4.3%
ValueCountFrequency (%)
6126
 
1.2%
5179
 
1.8%
4443
 
4.3%
33871
37.9%
23317
32.4%

Contacts_Count_12_mon
Real number (ℝ≥0)

ZEROS

Distinct7
Distinct (%)0.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean2.449867945
Minimum0
Maximum6
Zeros418
Zeros (%)4.1%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile1
Q12
median2
Q33
95-th percentile4
Maximum6
Range6
Interquartile range (IQR)1

Descriptive statistics

Standard deviation1.107773767
Coefficient of variation (CV)0.4521769305
Kurtosis0.004241125067
Mean2.449867945
Median Absolute Deviation (MAD)1
Skewness0.004228032285
Sum25045
Variance1.22716272
MonotonicityNot monotonic
Histogram with fixed size bins (bins=7)
ValueCountFrequency (%)
33410
33.4%
23264
31.9%
11507
14.7%
41394
13.6%
0418
 
4.1%
5176
 
1.7%
654
 
0.5%
ValueCountFrequency (%)
0418
 
4.1%
11507
14.7%
23264
31.9%
33410
33.4%
41394
13.6%
ValueCountFrequency (%)
654
 
0.5%
5176
 
1.7%
41394
13.6%
33410
33.4%
23264
31.9%

Credit_Limit
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6205
Distinct (%)60.7%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean8654.292967
Minimum1438.3
Maximum34516
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum1438.3
5-th percentile1439.1
Q12560.5
median4559
Q311116
95-th percentile34516
Maximum34516
Range33077.7
Interquartile range (IQR)8555.5

Descriptive statistics

Standard deviation9098.411381
Coefficient of variation (CV)1.051317701
Kurtosis1.790025838
Mean8654.292967
Median Absolute Deviation (MAD)2606
Skewness1.661411428
Sum88472837
Variance82781089.67
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
34516514
 
5.0%
1438.3510
 
5.0%
1598718
 
0.2%
995918
 
0.2%
2398112
 
0.1%
249011
 
0.1%
373511
 
0.1%
622411
 
0.1%
746910
 
0.1%
20698
 
0.1%
Other values (6195)9100
89.0%
ValueCountFrequency (%)
1438.3510
5.0%
14392
 
< 0.1%
14401
 
< 0.1%
14412
 
< 0.1%
14421
 
< 0.1%
ValueCountFrequency (%)
34516514
5.0%
344961
 
< 0.1%
344581
 
< 0.1%
344271
 
< 0.1%
341981
 
< 0.1%

Total_Revolving_Bal
Real number (ℝ≥0)

ZEROS

Distinct1974
Distinct (%)19.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean1165.096351
Minimum0
Maximum2517
Zeros2483
Zeros (%)24.3%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q1405.5
median1279
Q31785
95-th percentile2517
Maximum2517
Range2517
Interquartile range (IQR)1379.5

Descriptive statistics

Standard deviation814.811961
Coefficient of variation (CV)0.6993515687
Kurtosis-1.143610005
Mean1165.096351
Median Absolute Deviation (MAD)591
Skewness-0.1518827627
Sum11910780
Variance663918.5318
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02483
 
24.3%
2517516
 
5.0%
148012
 
0.1%
196512
 
0.1%
172011
 
0.1%
166411
 
0.1%
156011
 
0.1%
143411
 
0.1%
125010
 
0.1%
165010
 
0.1%
Other values (1964)7136
69.8%
ValueCountFrequency (%)
02483
24.3%
1321
 
< 0.1%
1341
 
< 0.1%
1451
 
< 0.1%
1541
 
< 0.1%
ValueCountFrequency (%)
2517516
5.0%
25143
 
< 0.1%
25131
 
< 0.1%
25122
 
< 0.1%
25111
 
< 0.1%

Avg_Open_To_Buy
Real number (ℝ≥0)

HIGH CORRELATION

Distinct6813
Distinct (%)66.6%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean7489.196615
Minimum3
Maximum34516
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum3
5-th percentile481.1
Q11327.5
median3488
Q39927
95-th percentile32185.8
Maximum34516
Range34513
Interquartile range (IQR)8599.5

Descriptive statistics

Standard deviation9100.061551
Coefficient of variation (CV)1.215091821
Kurtosis1.780975063
Mean7489.196615
Median Absolute Deviation (MAD)2679
Skewness1.656710773
Sum76562057
Variance82811120.23
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1438.3326
 
3.2%
3451699
 
1.0%
3199926
 
0.3%
7878
 
0.1%
7137
 
0.1%
9537
 
0.1%
4637
 
0.1%
7887
 
0.1%
7017
 
0.1%
8066
 
0.1%
Other values (6803)9723
95.1%
ValueCountFrequency (%)
31
< 0.1%
101
< 0.1%
142
< 0.1%
151
< 0.1%
241
< 0.1%
ValueCountFrequency (%)
3451699
1.0%
343621
 
< 0.1%
343021
 
< 0.1%
343001
 
< 0.1%
342971
 
< 0.1%

Total_Amt_Chng_Q4_Q1
Real number (ℝ≥0)

Distinct1158
Distinct (%)11.3%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7632900323
Minimum0
Maximum3.397
Zeros5
Zeros (%)< 0.1%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile0.464
Q10.631
median0.737
Q30.861
95-th percentile1.12
Maximum3.397
Range3.397
Interquartile range (IQR)0.23

Descriptive statistics

Standard deviation0.2275878692
Coefficient of variation (CV)0.2981669609
Kurtosis12.70114737
Mean0.7632900323
Median Absolute Deviation (MAD)0.114
Skewness2.038033024
Sum7803.114
Variance0.05179623822
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
0.79137
 
0.4%
0.71235
 
0.3%
0.74334
 
0.3%
0.71833
 
0.3%
0.73533
 
0.3%
0.69932
 
0.3%
0.72232
 
0.3%
0.78832
 
0.3%
0.74432
 
0.3%
0.63131
 
0.3%
Other values (1148)9892
96.8%
ValueCountFrequency (%)
05
< 0.1%
0.011
 
< 0.1%
0.0181
 
< 0.1%
0.0461
 
< 0.1%
0.0612
 
< 0.1%
ValueCountFrequency (%)
3.3972
< 0.1%
3.3552
< 0.1%
2.6751
< 0.1%
2.5941
< 0.1%
2.3681
< 0.1%

Total_Trans_Amt
Real number (ℝ≥0)

Distinct5033
Distinct (%)49.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean4375.045583
Minimum510
Maximum18484
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum510
5-th percentile1264
Q12124.5
median3880
Q34734
95-th percentile14203.8
Maximum18484
Range17974
Interquartile range (IQR)2609.5

Descriptive statistics

Standard deviation3394.397079
Coefficient of variation (CV)0.7758541059
Kurtosis3.927673233
Mean4375.045583
Median Absolute Deviation (MAD)1315
Skewness2.045084902
Sum44726091
Variance11521931.53
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
450911
 
0.1%
425311
 
0.1%
222910
 
0.1%
451810
 
0.1%
44989
 
0.1%
42209
 
0.1%
43139
 
0.1%
40429
 
0.1%
48699
 
0.1%
40379
 
0.1%
Other values (5023)10127
99.1%
ValueCountFrequency (%)
5101
< 0.1%
5301
< 0.1%
5631
< 0.1%
5691
< 0.1%
5941
< 0.1%
ValueCountFrequency (%)
184841
< 0.1%
179951
< 0.1%
177441
< 0.1%
176341
< 0.1%
176281
< 0.1%

Total_Trans_Ct
Real number (ℝ≥0)

Distinct126
Distinct (%)1.2%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean64.51345006
Minimum10
Maximum139
Zeros0
Zeros (%)0.0%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum10
5-th percentile27
Q144
median67
Q380
95-th percentile105
Maximum139
Range129
Interquartile range (IQR)36

Descriptive statistics

Standard deviation23.641091
Coefficient of variation (CV)0.3664521271
Kurtosis-0.3890897601
Mean64.51345006
Median Absolute Deviation (MAD)17
Skewness0.1530652148
Sum659521
Variance558.9011838
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
81208
 
2.0%
71203
 
2.0%
75203
 
2.0%
69202
 
2.0%
82202
 
2.0%
76198
 
1.9%
77197
 
1.9%
70193
 
1.9%
74190
 
1.9%
78190
 
1.9%
Other values (116)8237
80.6%
ValueCountFrequency (%)
104
< 0.1%
112
 
< 0.1%
124
< 0.1%
136
0.1%
149
0.1%
ValueCountFrequency (%)
1391
 
< 0.1%
1381
 
< 0.1%
1341
 
< 0.1%
1321
 
< 0.1%
1316
0.1%

Total_Ct_Chng_Q4_Q1
Real number (ℝ≥0)

Distinct830
Distinct (%)8.1%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.7151215886
Minimum0
Maximum3.714
Zeros7
Zeros (%)0.1%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile0.368
Q10.583
median0.703
Q30.821
95-th percentile1.077
Maximum3.714
Range3.714
Interquartile range (IQR)0.238

Descriptive statistics

Standard deviation0.2455565761
Coefficient of variation (CV)0.3433773781
Kurtosis16.40103587
Mean0.7151215886
Median Absolute Deviation (MAD)0.119
Skewness2.236630015
Sum7310.688
Variance0.06029803207
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
1173
 
1.7%
0.667172
 
1.7%
0.5162
 
1.6%
0.75157
 
1.5%
0.6117
 
1.1%
0.8102
 
1.0%
0.71494
 
0.9%
0.83386
 
0.8%
0.77870
 
0.7%
0.62564
 
0.6%
Other values (820)9026
88.3%
ValueCountFrequency (%)
07
0.1%
0.0281
 
< 0.1%
0.0291
 
< 0.1%
0.0381
 
< 0.1%
0.0531
 
< 0.1%
ValueCountFrequency (%)
3.7141
 
< 0.1%
3.5711
 
< 0.1%
3.51
 
< 0.1%
3.252
< 0.1%
33
< 0.1%

Avg_Utilization_Ratio
Real number (ℝ≥0)

ZEROS

Distinct964
Distinct (%)9.4%
Missing0
Missing (%)0.0%
Infinite0
Infinite (%)0.0%
Mean0.2747400959
Minimum0
Maximum0.999
Zeros2483
Zeros (%)24.3%
Negative0
Negative (%)0.0%
Memory size80.0 KiB

Quantile statistics

Minimum0
5-th percentile0
Q10.024
median0.175
Q30.502
95-th percentile0.792
Maximum0.999
Range0.999
Interquartile range (IQR)0.478

Descriptive statistics

Standard deviation0.2754291776
Coefficient of variation (CV)1.002508122
Kurtosis-0.7920669153
Mean0.2747400959
Median Absolute Deviation (MAD)0.175
Skewness0.7195038664
Sum2808.668
Variance0.0758612319
MonotonicityNot monotonic
Histogram with fixed size bins (bins=50)
ValueCountFrequency (%)
02483
 
24.3%
0.07344
 
0.4%
0.05734
 
0.3%
0.04833
 
0.3%
0.0630
 
0.3%
0.06129
 
0.3%
0.04529
 
0.3%
0.05929
 
0.3%
0.06928
 
0.3%
0.05327
 
0.3%
Other values (954)7457
72.9%
ValueCountFrequency (%)
02483
24.3%
0.0041
 
< 0.1%
0.0051
 
< 0.1%
0.0063
 
< 0.1%
0.0071
 
< 0.1%
ValueCountFrequency (%)
0.9991
< 0.1%
0.9951
< 0.1%
0.9941
< 0.1%
0.9921
< 0.1%
0.991
< 0.1%

Interactions

Correlations

Pearson's r

The Pearson's correlation coefficient (r) is a measure of linear correlation between two variables. It's value lies between -1 and +1, -1 indicating total negative linear correlation, 0 indicating no linear correlation and 1 indicating total positive linear correlation. Furthermore, r is invariant under separate changes in location and scale of the two variables, implying that for a linear function the angle to the x-axis does not affect r.

To calculate r for two variables X and Y, one divides the covariance of X and Y by the product of their standard deviations.

Spearman's ρ

The Spearman's rank correlation coefficient (ρ) is a measure of monotonic correlation between two variables, and is therefore better in catching nonlinear monotonic correlations than Pearson's r. It's value lies between -1 and +1, -1 indicating total negative monotonic correlation, 0 indicating no monotonic correlation and 1 indicating total positive monotonic correlation.

To calculate ρ for two variables X and Y, one divides the covariance of the rank variables of X and Y by the product of their standard deviations.

Kendall's τ

Similarly to Spearman's rank correlation coefficient, the Kendall rank correlation coefficient (τ) measures ordinal association between two variables. It's value lies between -1 and +1, -1 indicating total negative correlation, 0 indicating no correlation and 1 indicating total positive correlation.

To calculate τ for two variables X and Y, one determines the number of concordant and discordant pairs of observations. τ is given by the number of concordant pairs minus the discordant pairs divided by the total number of pairs.

Phik (φk)

Phik (φk) is a new and practical correlation coefficient that works consistently between categorical, ordinal and interval variables, captures non-linear dependency and reverts to the Pearson correlation coefficient in case of a bivariate normal input distribution. There is extensive documentation available here.

Cramér's V (φc)

Cramér's V is an association measure for nominal random variables. The coefficient ranges from 0 to 1, with 0 indicating independence and 1 indicating perfect association. The empirical estimators used for Cramér's V have been proved to be biased, even for large samples. We use a bias-corrected measure that has been proposed by Bergsma in 2013 that can be found here.

Missing values

A simple visualization of nullity by column.
Nullity matrix is a data-dense display which lets you quickly visually pick out patterns in data completion.
The correlation heatmap measures nullity correlation: how strongly the presence or absence of one variable affects the presence of another.
The dendrogram allows you to more fully correlate variable completion, revealing trends deeper than the pairwise ones visible in the correlation heatmap.

Sample

First rows

CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_Ratio
0768805383Existing Customer45.0M3High SchoolMarried$60K - $80KBlue3951312691.077711914.01.3351144421.6250.061
1818770008Existing Customer49.0F5GrAduateSingleLess than $40KBlue446128256.08647392.01.5411291333.7140.105
2713982108Existing Customer51.0M3GrAduateMarried$80K - $120KNaN364103418.003418.02.5941887202.3330.000
3769911858Existing CustomerNaNfemale4High SchoolUnknownLess than $40KNaN343413313.02517796.01.4051171202.3330.760
4709106358Existing CustomerNaNMALE3UneducatedMarried$60K - $80KBlue215104716.004716.02.175816282.5000.000
5713061558Existing Customer44.0M2GrAduateMarried$40K - $60KNaN363124010.012472763.01.3761088240.8460.311
6810347208Existing Customer51.0M4UnknownMarried$120K +gold4661334516.0226432252.01.9751330310.7220.066
7818906208Existing Customer32.0MALE0High SchoolUnknown$60K - $80KSilver2722229081.0139627685.02.2041538360.7140.048
8710930508Existing CustomerNaNMALE3UneducatedSingle$60K - $80KBlue3652022352.0251719835.03.3551350241.1820.113
9719661558Existing Customer48.0M2GrAduateSingle$80K - $120KBlue3663311656.016779979.01.5241441320.8820.144

Last rows

CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_Ratio
10213720201033Attrited Customer53.0M2GrAduateMarried$80K - $120KBlue4133211669.022279442.00.622720230.3530.191
10214718039683Existing Customer52.0M2High SchoolSingle$40K - $60KBlue4531213532.0130012232.00.8461521291.4170.096
10215710806083Existing Customer50.0F1UneducatedUnknownLess than $40KSilver3642211888.020909798.00.7001137300.7650.176
10216714752208Existing Customer42.0F1GrAduateMarriedLess than $40KBlue365122393.01985408.00.788978160.7780.830
10217822063408Existing Customer43.0M4UnknownMarried$40K - $60KBlue395126111.025173594.00.6321221162.2000.412
10218714490308Existing Customer57.0M4GrAduateMarried$80K - $120KBlue4632319270.0166217608.01.1861565281.5450.086
10219713356833Existing Customer52.0M1GrAduateMarried$80K - $120KBlue433143710.025171193.01.5411578340.7000.678
10220711402333Existing Customer47.0M2DoctorateSingleLess than $40KBlue372133235.07972438.00.5411493321.0000.246
10221708296883Existing Customer44.0F3GrAduateSingle$40K - $60KBlue3831011749.078810961.00.6881178270.9290.067
10222719154483Existing Customer53.0M2High SchoolDivorced$60K - $80KBlue365327753.021005653.00.839991260.7330.271

Duplicate rows

Most frequent

CLIENTNUMAttrition_FlagCustomer_AgeGenderDependent_countEducation_LevelMarital_StatusIncome_CategoryCard_CategoryMonths_on_bookTotal_Relationship_CountMonths_Inactive_12_monContacts_Count_12_monCredit_LimitTotal_Revolving_BalAvg_Open_To_BuyTotal_Amt_Chng_Q4_Q1Total_Trans_AmtTotal_Trans_CtTotal_Ct_Chng_Q4_Q1Avg_Utilization_Ratio# duplicates
0708296883Existing Customer44.0F3GrAduateSingle$40K - $60KBlue3831011749.078810961.00.6881178270.9290.0672
1708300483Attrited Customer66.0F0DoctorateMarriedUnknownBlue565437882.06057277.01.052704160.1430.0772
2708476808Existing Customer54.0M4UnknownDivorced$120K +Blue3663233791.0196031831.00.6181047310.8240.0582
3708492558Existing Customer42.0M3UnknownMarried$60K - $80KBlue3653315088.086514223.00.9391107211.6250.0572
4708508758Attrited Customer62.0F0GrAduateMarriedLess than $40KBlue492331438.301438.31.047692160.6000.0002
5710599683Existing Customer56.0M1collegeSingle$80K - $120KBlue3636011751.0011751.03.3971539173.2500.0002
6710806083Existing Customer50.0F1UneducatedUnknownLess than $40KSilver3642211888.020909798.00.7001137300.7650.1762
7710916408Existing Customer57.0M4GrAduateMarried$60K - $80KBlue394308466.018866580.00.5601342320.3910.2232
8711318033Existing Customer47.0M3GrAduateSingle$60K - $80KBlue364222926.002926.00.6761304240.8460.0002
9711402333Existing Customer47.0M2DoctorateSingleLess than $40KBlue372133235.07972438.00.5411493321.0000.2462